1 Working with website data

We’ll compare the RCurl and httr packages for downloading web data, read and write XML for the SOCR Main Page, and scrape its content. First, we need to ensure the necessary packages are installed and loaded.

#install.packages(c("RCurl", "httr", "rvest", "XML"))

1.1 Downloading SOCR Wiki Page

1.1.1 Using RCurl

library(RCurl)
web <- "https://wiki.socr.umich.edu/index.php/Main_Page"
socr_rcurl <- getURL(web, followlocation = TRUE)
str(socr_rcurl, nchar.max = 200)
##  chr "<!DOCTYPE html>\n<html class=\"client-nojs\" lang=\"en\" dir=\"ltr\">\n<head>\n<meta charset=\"UTF-8\"/>\n<title>SOCR</title>\n<script>document.documentElement.className = document.do"| __truncated__

1.1.2 Using httr

library(httr)
socr_httr <- GET("https://wiki.socr.umich.edu/index.php/Main_Page")
str(socr_httr[1:3])
## List of 3
##  $ url        : chr "https://wiki.socr.umich.edu/index.php/Main_Page"
##  $ status_code: int 200
##  $ headers    :List of 15
##   ..$ date                  : chr "Fri, 29 Sep 2023 11:14:07 GMT"
##   ..$ server                : chr "Apache"
##   ..$ x-powered-by          : chr "PHP/7.3.11"
##   ..$ x-content-type-options: chr "nosniff"
##   ..$ content-language      : chr "en"
##   ..$ x-ua-compatible       : chr "IE=Edge"
##   ..$ link                  : chr "</local/images/SOCR_3D_Logo_UM.png?55fd6>;rel=preload;as=image"
##   ..$ vary                  : chr "Accept-Encoding,Cookie"
##   ..$ expires               : chr "Thu, 01 Jan 1970 00:00:00 GMT"
##   ..$ cache-control         : chr "private, must-revalidate, max-age=0"
##   ..$ last-modified         : chr "Wed, 03 May 2023 17:49:16 GMT"
##   ..$ content-encoding      : chr "gzip"
##   ..$ content-length        : chr "6824"
##   ..$ content-type          : chr "text/html; charset=UTF-8"
##   ..$ set-cookie            : chr "LBSESSIONID=1392759693.47873.0000; path=/; Httponly; Secure; SameSite=none"
##   ..- attr(*, "class")= chr [1:2] "insensitive" "list"

Both methods provide similar results, but httr offers a more user-friendly interface.

1.2 Read and Write XML

library(XML)
web.parsed <- htmlParse(socr_rcurl, asText = TRUE, encoding="UTF-8")
plain.text <- xpathSApply(web.parsed, "//p", xmlValue)
cleaned.text <- paste(plain.text, collapse = "\n")
cleaned.text <- gsub("\n+", "\n", cleaned.text) # Remove multiple newlines
cleaned.text <- gsub("^\\s+|\\s+$", "", cleaned.text) # Remove leading/trailing white spaces
cleaned.text <- gsub("(?<!Translate this page:)\n", " ", cleaned.text, perl = TRUE)
cleaned.text <- gsub("Translate this page:\n", "Translate this page: ", cleaned.text) # Remove newline after 'Translate this page:'
substr(cleaned.text, start=1, stop=256)
## [1] "It's online, therefore it exists! Translate this page: (default)  Deutsch  Español Français  Italiano  Português  日本語  България  الامارات العربية المتحدة  Suomi  इस भाषा में  Norge  한국어  中文  繁体中文 Русский Nederlands  Ελληνικά Hrvatska  Česká republika  Danm"

1.3 Scrape and Ingest Information

Let’s extract data from the SOCR Main Page and perform scraping.

library(rvest)
SOCR<-read_html("https://wiki.socr.umich.edu/index.php/SOCR_Data")
SOCR
## {html_document}
## <html class="client-nojs" lang="en" dir="ltr">
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
## [2] <body class="mediawiki ltr sitedir-ltr mw-hide-empty-elt ns-0 ns-subject  ...
SOCR %>% html_node("head title") %>% html_text()
## [1] "SOCR Data - SOCR"
meta <- SOCR %>% html_nodes("head meta") %>% html_attrs()
meta
## [[1]]
##                 http-equiv                    content 
##             "Content-Type" "text/html; charset=UTF-8" 
## 
## [[2]]
## charset 
## "UTF-8" 
## 
## [[3]]
##                          name                       content 
## "ResourceLoaderDynamicStyles"                            "" 
## 
## [[4]]
##               name            content 
##        "generator" "MediaWiki 1.31.6"

2 Network data and visualization

We will be analyzing the network graph of characters from Victor Hugo’s novel “Les Misérables”. First, we’ll load the necessary libraries and the data.

#install.packages("igraph")
library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
# Load the graph data
df <- read.table("I:/UBD PB/ZH5103 Predictive/Specialized/03_les miserablese_GraphData.txt", header = FALSE)
g <- graph_from_data_frame(df, directed = FALSE)

2.1 Visualizing the Graph

plot(g, vertex.size=10, vertex.label.cex=0.5, main="Les Misérables Character Network")

2.2 Summarizing the Graph

summary(g)
## IGRAPH 4fdcfe6 UN-- 77 254 -- 
## + attr: name (v/c)

The graph g is an undirected network representing connections between different entities. It consists of 77 nodes (or vertices) and 254 edges (or connections). Each vertex in this graph has an associated attribute named “name”, which is of character type. This attribute likely represents the label or identifier for each vertex. The graph provides a structured representation of the relationships between these entities, where the edges denote some form of association or interaction between pairs of nodes. Given the context of “Les Misérables”, it’s plausible that the nodes represent characters from the novel, and the edges signify some form of relationship or interaction between them.

2.3 Degree and Centrality

2.3.1 Degree

degree_distribution <- sort(degree(g), decreasing = TRUE)
degree_distribution
##          Valjean         Gavroche           Marius           Javert 
##               36               22               19               17 
##       Thenardier          Fantine         Enjolras       Courfeyrac 
##               16               15               15               13 
##          Bossuet          Bahorel             Joly    MmeThenardier 
##               13               12               12               11 
##          Cosette          Eponine           Mabeuf       Combeferre 
##               11               11               11               11 
##          Feuilly           Myriel        Gueulemer            Babet 
##               11               10               10               10 
##       Claquesous        Grantaire        Tholomyes        Prouvaire 
##               10               10                9                9 
##     Montparnasse       Bamatabois        Listolier          Fameuil 
##                9                8                7                7 
##      Blacheville        Favourite           Dahlia          Zephine 
##                7                7                7                7 
##     Gillenormand MlleGillenormand           Brujon     MmeHucheloup 
##                7                7                7                7 
##            Judge     Champmathieu           Brevet       Chenildieu 
##                6                6                6                6 
##      Cochepaille     Fauchelevent   LtGillenormand         Simplice 
##                6                4                4                4 
##   MlleBaptistine      MmeMagloire        Pontmercy          Anzelma 
##                3                3                3                3 
##           Woman2        Toussaint       Marguerite         Perpetue 
##                3                3                2                2 
##        MmeBurgon           Child1           Woman1   MotherInnocent 
##                2                2                2                2 
##           Magnon     MmePontmercy        BaronessT           Child2 
##                2                2                2                2 
##          Labarre        Jondrette         Napoleon     CountessDeLo 
##                1                1                1                1 
##         Geborand     Champtercier         Cravatte            Count 
##                1                1                1                1 
##           OldMan           MmeDeR          Isabeau          Gervais 
##                1                1                1                1 
##      Scaufflaire     Boulatruelle          Gribier      MlleVaubois 
##                1                1                1                1 
##   MotherPlutarch 
##                1

The degree of a node in an undirected graph is the number of connections it has.

The graph represents the interactions between characters in “Les Misérables”. The most connected character is Aljean with 36 connections, followed by Gavroche with 22, and Marius with 19. On the other end of the spectrum, several characters, such as Labarre, Jondrette, Napoleon, and CountessDeLo, among others, have only a single connection. Most characters have multiple interactions, with the majority having more than 5 connections.

2.3.2 Centrality

centrality <- sort(betweenness(g), decreasing = TRUE)
centrality
##          Valjean           Myriel         Gavroche           Marius 
##     1624.4688004      504.0000000      470.5706319      376.2925926 
##          Fantine       Thenardier           Javert MlleGillenormand 
##      369.4869418      213.4684805      154.8449450      135.6569444 
##         Enjolras        Tholomyes          Bossuet    MmeThenardier 
##      121.2770669      115.7936423       87.6479030       82.6568934 
##           Mabeuf     Fauchelevent        MmeBurgon          Cosette 
##       78.8345238       75.5000000       75.0000000       67.8193223 
##     Gillenormand          Eponine         Simplice       Bamatabois 
##       57.6002715       32.7395194       24.6248408       22.9166667 
##        Pontmercy       Courfeyrac        Gueulemer            Babet 
##       19.7375000       15.0110352       14.1370943       14.1370943 
##       Claquesous     Montparnasse          Bahorel             Joly 
##       13.8561420       11.0404151        6.2286417        6.2286417 
##       Combeferre          Feuilly           Brujon     MmePontmercy 
##        3.5629149        3.5629149        1.2500000        1.0000000 
##           Magnon        Grantaire   MlleBaptistine          Labarre 
##        0.6190476        0.4285714        0.0000000        0.0000000 
##      MmeMagloire        Listolier          Fameuil      Blacheville 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##        Favourite           Dahlia          Zephine       Marguerite 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##         Perpetue            Judge     Champmathieu           Brevet 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##       Chenildieu        Jondrette   LtGillenormand        Prouvaire 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##           Child1         Napoleon     CountessDeLo         Geborand 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##     Champtercier         Cravatte            Count           OldMan 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##           MmeDeR          Isabeau          Gervais      Scaufflaire 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##           Woman1      Cochepaille     Boulatruelle          Anzelma 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##           Woman2   MotherInnocent          Gribier      MlleVaubois 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##        BaronessT   MotherPlutarch        Toussaint           Child2 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##     MmeHucheloup 
##        0.0000000

Centrality measures how close a node is to all other nodes in the network.

In the character network of “Les Misérables”, Valjean emerges as the most central figure with a betweenness centrality score of 1624.47, signifying his role as a key bridge or intermediary within the network. The character Myriel is next with a score of 504, followed closely by Gavroche with a score of 470.57. This suggests that these characters are crucial connectors in the story, linking different clusters or groups of characters. Conversely, many characters, such as Labarre, MmeMagloire, and Listolier, among others, have a centrality score of zero, indicating that they don’t act as bridges or intermediaries in the network.

2.4 Important Nodes

Nodes with higher degrees and centrality scores are generally more important in the network.

degree_distribution[1:10]
##    Valjean   Gavroche     Marius     Javert Thenardier    Fantine   Enjolras 
##         36         22         19         17         16         15         15 
## Courfeyrac    Bossuet    Bahorel 
##         13         13         12
centrality[1:10]
##          Valjean           Myriel         Gavroche           Marius 
##        1624.4688         504.0000         470.5706         376.2926 
##          Fantine       Thenardier           Javert MlleGillenormand 
##         369.4869         213.4685         154.8449         135.6569 
##         Enjolras        Tholomyes 
##         121.2771         115.7936

The output will display the top 10 nodes (characters) with the highest degree and centrality, indicating their importance in the network.

2.5 Implications of a Directed Graph

If we were to assume the graph is directed, the relationships would be one-way, meaning one character interacts with another, but it may not be reciprocated. This would result in the calculation of in-degree (incoming connections) and out-degree (outgoing connections) for each node. The interpretations of centrality measures would also be different in the context of directed relationships.

3 Data conversion and parallel computing

We’ll be analyzing the data on adults with heart attacks, performing data processing, model training, and parallel computing. Install and load the necessary packages.

#install.packages(c("openxlsx", "rio", "data.table","ff","ffbase", "dplyr", "randomForest", "caret", "parallel", "doParallel", "foreach"))

3.1 Load Data as DataFrame

library(openxlsx)
data <- read.xlsx("I:/UBD PB/ZH5103 Predictive/Specialized/CaseStudy12_ AdultsHeartAttack_Data (1).xlsx")
head(data)
##   Patient DIAGNOSIS SEX DRG DIED CHARGES LOS AGE
## 1       1     41041   F 122    0    4752  10  79
## 2       2     41041   F 122    0    3941   6  34
## 3       3     41091   F 122    0    3657   5  76
## 4       4     41081   F 122    0    1481   2  80
## 5       5     41091   M 122    0    1681   1  55
## 6       6     41091   M 121    0    6379   9  84

3.2 Renew the xlsx File

write.xlsx(data, "Renewed_HeartAttack_Data.xlsx")

3.3 Convert xlsx to csv Using rio

library(rio)
export(data, "HeartAttack_Data.csv")

3.4 Generate a data.table

library(data.table)
dt_data <- as.data.table(data)
head(dt_data)
##    Patient DIAGNOSIS SEX DRG DIED CHARGES LOS AGE
## 1:       1     41041   F 122    0    4752  10  79
## 2:       2     41041   F 122    0    3941   6  34
## 3:       3     41091   F 122    0    3657   5  76
## 4:       4     41081   F 122    0    1481   2  80
## 5:       5     41091   M 122    0    1681   1  55
## 6:       6     41091   M 121    0    6379   9  84

3.5 Disk-based Data Frames

library(ff)
## Loading required package: bit
## 
## Attaching package: 'bit'
## The following object is masked from 'package:data.table':
## 
##     setattr
## The following object is masked from 'package:RCurl':
## 
##     clone
## The following object is masked from 'package:base':
## 
##     xor
## Attaching package ff
## - getOption("fftempdir")=="C:/Users/Jun/AppData/Local/Temp/Rtmpo9X3KW/ff"
## - getOption("ffextension")=="ff"
## - getOption("ffdrop")==TRUE
## - getOption("fffinonexit")==TRUE
## - getOption("ffpagesize")==65536
## - getOption("ffcaching")=="mmnoflush"  -- consider "ffeachflush" if your system stalls on large writes
## - getOption("ffbatchbytes")==16777216 -- consider a different value for tuning your system
## - getOption("ffmaxbytes")==536870912 -- consider a different value for tuning your system
## 
## Attaching package: 'ff'
## The following objects are masked from 'package:utils':
## 
##     write.csv, write.csv2
## The following objects are masked from 'package:base':
## 
##     is.factor, is.ordered
ha1<-read.csv.ffdf(file="HeartAttack_Data.csv", header=T)
mean(ha1$Age)
## [1] NA

We cannot apply functions directly on this ff package.

3.6 Basic Calculation on the Last 5 Columns

library(ffbase)
## Registered S3 methods overwritten by 'ffbase':
##   method   from
##   [.ff     ff  
##   [.ffdf   ff  
##   [<-.ff   ff  
##   [<-.ffdf ff
## 
## Attaching package: 'ffbase'
## The following objects are masked from 'package:base':
## 
##     %in%, table
ha2<-read.csv.ffdf(file="HeartAttack_Data.csv", header=T)
str(ha2[,4:8])
## 'data.frame':    150 obs. of  5 variables:
##  $ DRG    : int  122 122 122 122 122 121 121 121 121 123 ...
##  $ DIED   : int  0 0 0 0 0 0 0 0 0 1 ...
##  $ CHARGES: Factor w/ 146 levels ".","10003","10020",..: 86 76 69 25 33 99 11 32 78 38 ...
##  $ LOS    : int  10 6 5 2 1 9 15 15 2 1 ...
##  $ AGE    : int  79 34 76 80 55 84 84 70 76 65 ...
mean(ha2$DRG)
## [1] 121.7533
mean(ha2$DIED)
## [1] 0.1066667
table(ha2$CHARGES)
## 
##     . 10003 10020 10089 10173  1035 10530  1070 10750  1095 10959 11080 11163 
##     2     1     1     1     1     1     1     1     1     1     1     1     1 
## 11969  1205  1229 12403 12494  1251  1354  1383  1409  1416  1432  1481  1505 
##     1     1     1     1     1     1     1     1     1     1     1     2     1 
##  1510  1582  1595 16429  1655 16584  1681  1684 17137  1805  1974  1989  2173 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  2227  2267  2274  2308  2423  2441  2521  2556  2584  2646  2671  2750  2886 
##     1     2     1     1     1     1     1     1     1     1     1     1     1 
##  2905  3033  3036  3090  3096  3140  3148  3173  3215  3235  3315  3318  3386 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  3400  3440  3512  3657  3694  3739  3790  3882  3918  3931  3941  3988  4015 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  4068  4101  4174  4317  4392  4434  4662  4752  4864  4896  4974  4990  4994 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  5001  5461  5861  6065  6145  6298  6316  6379  6491  6539  6736  6912  6977 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##   701  7039  7049  7160  7165  7182  7214  7223  7240  7307   733  7330  7429 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  7454  7472  7497  7518  7589  7676  7856  7901  7916  8066   815  8227  8260 
##     2     1     1     1     1     1     1     1     1     1     1     1     1 
##  8274  8358  8596  8638  8646  8732  8817  8830  8858  9091  9117  9155  9158 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  9279  9517  9863 
##     1     1     1
mean(ha2$LOS)
## [1] 4.166667
mean(ha2$AGE)
## [1] 66.32667

Using the ffbase package, we can perform efficient data manipulations on large datasets. For columns with integer data, we can calculate the mean, while for factor columns, we can compute the frequency.

3.7 Predicting DIED Using RandomForest

First, let’s create a balanced dataset:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:bit':
## 
##     symdiff
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:igraph':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
set.seed(123)
died_data <- data[data$DIED == 1, ]
alive_data <- data[data$DIED == 0, ] %>% sample_n(size = nrow(died_data))
balanced_data <- rbind(died_data, alive_data)

Now, train the random forest model:

library(randomForest)
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
## 
##     combine
set.seed(123)
rf_model <- randomForest(DIED ~ DIAGNOSIS + SEX + DRG + CHARGES + LOS + AGE, data = balanced_data, ntree = 20000)
rf_model
## 
## Call:
##  randomForest(formula = DIED ~ DIAGNOSIS + SEX + DRG + CHARGES +      LOS + AGE, data = balanced_data, ntree = 20000) 
##                Type of random forest: regression
##                      Number of trees: 20000
## No. of variables tried at each split: 2
## 
##           Mean of squared residuals: 0.02157022
##                     % Var explained: 91.37

3.8 Train Model Using caret

First, we set up training control:

library(parallel)
library(caret)
## Loading required package: ggplot2
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:randomForest':
## 
##     margin
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:httr':
## 
##     progress
set.seed(123)
train_control <- trainControl(method = "cv", number = 5)
system.time({
  model_caret <- train(DIED ~ DIAGNOSIS + SEX + DRG + CHARGES + LOS + AGE, data = balanced_data, method = "rf", trControl = train_control, tuneLength = 5)
})
##    user  system elapsed 
##    0.67    0.04    0.71

In R, system.time() measures code execution with “user” time (active CPU duration), “system” time (CPU time for system operations), and “elapsed” time (total real-world duration).

In general, when you’re trying to assess how long your code takes to run from a user’s perspective, the elapsed time is the most relevant. But the breakdown into user and system can give insights into where the time is spent (in user-level computation vs. system-level calls).

3.9 Parallelized Training

Detect cores:

cl <- makeCluster(detectCores())

Train the model in parallel:

set.seed(123)
train_control_parallel <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

system.time({
  model_caret_parallel <- train(DIED ~ DIAGNOSIS + SEX + DRG + CHARGES + LOS + AGE, data = balanced_data, method = "rf", trControl = train_control_parallel, tuneLength = 5)
})
##    user  system elapsed 
##    0.66    0.03    0.68
stopCluster(cl)

In the computational tasks, both the standard caret method and the parallelized training were closely matched in terms of elapsed time. The parallelized training was marginally quicker. While the user time for the parallelized approach was slightly reduced compared to the caret method, the system time was somewhat shorter in the caret method. Overall, the differences in execution times between the two methods were minimal, with the parallelized training offering a slight edge in efficiency.

3.10 Parallelized RandomForest Using foreach and doParallel

set.seed(123)
library(doParallel)
## Loading required package: foreach
## Loading required package: iterators
registerDoParallel(makeCluster(detectCores()))

library(foreach)
system.time({
  rf_model_parallel <- foreach(ntree = rep(20000), .combine = combine, .packages = "randomForest") %dopar% {
    randomForest(DIED ~ DIAGNOSIS + SEX + DRG + CHARGES + LOS + AGE, data = balanced_data, ntree = ntree)
  }
})
##    user  system elapsed 
##    0.02    0.04    0.62
stopCluster(cl)
registerDoSEQ()

In the computational comparison, the standard caret method and the parallelized training had very close elapsed times, with the parallelized approach being slightly faster. Standard caret method doesn’t utilize parallel processing and use single-core training method.Parallelized Training distributes the training of a single random forest model (with potential large ntree) across multiple cores during cross-validation will likely yield significant speedup, especially if the dataset is large. The most noticeable improvement in efficiency came when using foreach and doParallel for parallelized random forest training. This method was significantly more time-efficient compared to the other two approaches.It divides the total number of trees (20,000 in this case) across the available cores. Each core builds its random forest, and then all the forests are combined. This approach minimizes overhead and maximizes parallel efficiency.

4 Image Processing with R, C++, and Python

4.1 Load a 2D Image in R

#install.packages(c("imager","Rcpp","reticulate","plotly"))
library(imager)
## Loading required package: magrittr
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:ff':
## 
##     add
## 
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
## 
##     add
## The following object is masked from 'package:randomForest':
## 
##     grow
## The following object is masked from 'package:dplyr':
## 
##     where
## The following object is masked from 'package:ff':
## 
##     add
## The following object is masked from 'package:igraph':
## 
##     spectrum
## The following objects are masked from 'package:stats':
## 
##     convolve, spectrum
## The following object is masked from 'package:graphics':
## 
##     frame
## The following object is masked from 'package:base':
## 
##     save.image
img <- load.image("I:/UBD PB/ZH5103 Predictive/Specialized/MRI_ImageHematoma.jpg")
plot(img)

4.2 Convolve Image with 2D Gaussian Kernel (C++)

Now, let’s create a Gaussian kernel and use the C++ function to convolve the image. We will use the Rcpp package to write and execute C++ code.

library(Rcpp)
cppFunction('
NumericMatrix gaussianSmooth(NumericMatrix img) {
    int rows = img.nrow();
    int cols = img.ncol();
    NumericMatrix result(rows, cols);
    
    // Define a 2D Gaussian kernel
    NumericMatrix kernel(3,3);
    kernel(0,0) = 1/16.0; kernel(0,1) = 2/16.0; kernel(0,2) = 1/16.0;
    kernel(1,0) = 2/16.0; kernel(1,1) = 4/16.0; kernel(1,2) = 2/16.0;
    kernel(2,0) = 1/16.0; kernel(2,1) = 2/16.0; kernel(2,2) = 1/16.0;
    
    // Convolve the image with the kernel
    for(int i = 1; i < (rows - 1); i++) {
        for(int j = 1; j < (cols - 1); j++) {
            double sum = 0;
            for(int ki = -1; ki <= 1; ki++) {
                for(int kj = -1; kj <= 1; kj++) {
                    sum += img(i + ki, j + kj) * kernel(ki + 1, kj + 1);
                }
            }
            result(i, j) = sum;
        }
    }
    
    return result;
}
')

img_gray <- grayscale(img)
img_matrix <- as.matrix(img_gray)
smoothed_img <- gaussianSmooth(img_matrix)

4.3 Display both original and smoothed images using plot_ly.

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:imager':
## 
##     highlight
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:rio':
## 
##     export
## The following object is masked from 'package:igraph':
## 
##     groups
## The following object is masked from 'package:httr':
## 
##     config
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
# Original Image
p1 <- plot_ly(
  x = 1:ncol(img_matrix),
  y = 1:nrow(img_matrix),
  z = img_matrix,
  type = "heatmap",
  colorscale = "Greys",
  zmin = 0,
  zmax = 1
)

# Smoothed Image
p2 <- plot_ly(
  x = 1:ncol(smoothed_img),
  y = 1:nrow(smoothed_img),
  z = smoothed_img,
  type = "heatmap",
  colorscale = "Greys",
  zmin = 0,
  zmax = 1
)

combined_plot <- subplot(p1, p2)

# Add titles
combined_plot %>% layout(
  annotations = list(
    list(
      text = "Original Image",
      xref = "paper",
      yref = "paper",
      x = 0.15,  # Relative position within the plot area
      y = 1.05,  # Relative position above the plot area
      showarrow = F,
      font = list(size = 14)
    ),
    list(
      text = "Smoothed Image",
      xref = "paper",
      yref = "paper",
      x = 0.85,  # Relative position within the plot area
      y = 1.05,  # Relative position above the plot area
      showarrow = F,
      font = list(size = 14)
    )
  )
)

4.4 Retrieve both Original and Smoothed Images in Python and Compute their Difference

import numpy as np

#Retrieve both Original and Smoothed Images in Python
img_matrix_python = np.array(r.img_matrix)
smoothed_img_python = np.array(r.smoothed_img)

# Compute the difference between the original and smoothed images
difference_img = img_matrix_python - smoothed_img_python
difference_img
## array([[0.00392157, 0.00392157, 0.00392157, ..., 0.00392157, 0.00392157,
##         0.00392157],
##        [0.00392157, 0.        , 0.        , ..., 0.        , 0.        ,
##         0.00392157],
##        [0.00392157, 0.        , 0.        , ..., 0.        , 0.        ,
##         0.00392157],
##        ...,
##        [0.00392157, 0.        , 0.        , ..., 0.        , 0.        ,
##         0.00392157],
##        [0.00392157, 0.        , 0.        , ..., 0.        , 0.        ,
##         0.00392157],
##        [0.00392157, 0.00392157, 0.00392157, ..., 0.00392157, 0.00392157,
##         0.00392157]])

The Python code computes the difference between an original and a smoothed image to highlight the areas most impacted by the smoothing process. The resulting difference_img captures these alterations: values close to 0 indicate little to no change, while non-zero values point to more significant modifications. In this specific output, values of 0.00392157 suggest minor changes due to smoothing, whereas values of 0 indicate unchanged regions. This difference image thus offers insights into the regions most influenced by the smoothing operation.

4.5 Display Original, Smoothed and Difference Images in 2D and 3D using plot_ly

# Pull the difference image from Python
library(reticulate)
## 
## Attaching package: 'reticulate'
## The following object is masked from 'package:rio':
## 
##     import
diff_img <- py$difference_img

# Display all three images using plot_ly in 2D
p1 <- plot_ly(z = as.matrix(img_gray), type = "heatmap")
p2 <- plot_ly(z = smoothed_img, type = "heatmap")
p3 <- plot_ly(z = diff_img, type = "heatmap")

combined_plot<-subplot(p1, p2, p3)

# Add titles
combined_plot %>% layout(
  annotations = list(
    list(
      text = "Original Image",
      xref = "paper",
      yref = "paper",
      x = 0.1,  # Relative position within the plot area
      y = 1.05,  # Relative position above the plot area
      showarrow = F,
      font = list(size = 14)
    ),
    list(
      text = "Smoothed Image",
      xref = "paper",
      yref = "paper",
      x = 0.5,  # Relative position within the plot area
      y = 1.05,  # Relative position above the plot area
      showarrow = F,
      font = list(size = 14)
    ),list(
      text = "Difference Image",
      xref = "paper",
      yref = "paper",
      x = 0.95,  # Relative position within the plot area
      y = 1.05,  # Relative position above the plot area
      showarrow = F,
      font = list(size = 14)
  )
)
)

The reticulate package in R facilitates seamless integration between R and Python. By installing and loading reticulate, users can execute Python code directly within an R environment, convert R data structures to Python, and retrieve Python objects back into R. This interplay allows for the harnessing of capabilities from both languages in a single workflow, bridging the gap between two of the most popular data science languages.

The R code visualizes the original, smoothed, and difference images side by side using the plot_ly function. It highlights the raw data in the original image, the effect of Gaussian smoothing in the smoothed image, and the specific changes brought about by the smoothing in the difference image. This side-by-side comparison allows for an immediate understanding of the impact of the smoothing process on the image.

p1_3d <- plot_ly(z = as.matrix(img_gray), type = "surface") %>% layout(title = "Original Image 3D")
p2_3d <- plot_ly(z = smoothed_img, type = "surface") %>% layout(title = "Smoothed Image 3D")
p3_3d <- plot_ly(z = diff_img, type = "surface") %>% layout(title = "Difference Image 3D")

p1_3d
p2_3d
p3_3d

The R code visualizes the original, smoothed, and difference images in 3D. The “Original Image 3D” displays the MRI’s inherent intensity variations. The “Smoothed Image 3D” shows the effect of Gaussian smoothing, softening the image’s features. The “Difference Image 3D” highlights where the smoothing had the most influence. This 3D perspective underscores the changes brought by the smoothing process, aiding in understanding its impact on the image.